Genome assembly of an Australian native grass species reveals a recent whole-genome duplication and biased gene retention of genes involved in stress response

Abstract Background The adaptive significance of polyploidy has been extensively debated, and chromosome-level genome assemblies of polyploids can provide insight into this. The Australian grass Bothriochloa decipiens belongs to the BCD clade, a group with a complex history of hybridization and polyploid. This is the first genome assembly and annotation of a species that belongs to this fascinating yet complex group. Findings Using Illumina short reads, 10X Genomics linked reads, and Hi-C sequencing data, we assembled a highly contiguous genome of B. decipiens, with a total length of 1,218.22 Mb and scaffold N50 of 42.637 Mb. Comparative analysis revealed that the species experienced a relatively recent whole-genome duplication. We clustered the 20 major scaffolds, representing the 20 chromosomes, into the 2 subgenomes of the parental species using unique repeat signatures. We found evidence of biased fractionation and differences in the activity of transposable elements between the subgenomes prior to hybridization. Duplicates were enriched for genes involved in transcription and response to external stimuli, supporting a biased retention of duplicated genes following whole-genome duplication. Conclusions Our results support the hypotheses of a biased retention of duplicated genes following polyploidy and point to differences in repeat activity associated with subgenome dominance. B. decipiens is a widespread species with the ability to establish across many soil types, making it a prime candidate for climate change– resilient ecological restoration of Australian grasslands. This reference genome is a valuable resource for future population genomic research on Australian grasses.


Abstract: Background
The adaptive significance of polyploidy has been extensively debated and chromosome level genome assemblies of polyploids can provide insight into this topic. The Australian grass, Bothriochloa decipiens, belongs to the BCD clade, a group with a complex history of hybridization and polyploidy. This is the first genome assembly and annotation of a species that belongs to this fascinating yet complex group.

Findings
Using a combination of Illumina short reads, 10X Genomics linked reads and Hi-C sequencing data we assembled a highly contiguous genome of Bothriochloa decipiens , with a total length of 1,218.22 Mb and scaffold N50 of 42.637 Mb. Comparative analysis revealed that the species is a diploidized allotetraploid. We clustered the 20 major scaffolds, representing the 20 chromosomes, into the two sub genomes of the parental species using unique repeat signatures. Found evidence of biased fractionation and differences in the activity of transposable elements between the sub genomes prior to hybridization. Duplicates were enriched for genes involved in transcription and response to external stimuli like drought, supporting a biased retention of duplicated genes following whole genome duplication.

Conclusions
Our results support hypotheses explaining the biased retention of duplicated genes following polyploidy and point to differences in repeat activity associated with sub genome dominance. Bothriochloa decipiens is a widespread species with the ability to establish across many soil types, making it useful for ecological restoration of Australian grasslands. This reference genome is a valuable resource for future population genomic research involving Australian grasses which may be helpful in ecological restoration projects.

25
Background 26 The adaptive significance of polyploidy has been extensively debated and chromosome level 27 genome assemblies of polyploids can provide insight into this topic. The Australian grass, 28 Bothriochloa decipiens, belongs to the BCD clade, a group with a complex history of 29 hybridization and polyploidy. This is the first genome assembly and annotation of a species 30 that belongs to this fascinating yet complex group. Here we report a chromosome-level genome assembly, annotation and comparative analysis 100 of a species in the BCD clade, Bothriochloa decipiens. This is the first genome assembly and 101 annotation of a species that belongs to this fascinating yet complex group. Our highly 102 contiguous B. decipiens genome assembly showed clear evidence of recent paleo-polyploidy.

103
Using repeat signatures diverged between putative homeologous chromosomes we were able 104 to organise chromosomes into sub genomes, allowing estimation of the timing of the 105 speciation event prior to the most recent allopolyploidy event in this species. We further 106 describe signatures of biased fractionation between sub genomes, as well as biases in 107 functional annotations of genes retained as duplicated or single copy. This genome reference 108 will act as an important resource for population-genomic analysis of the group and will aid 109 our understanding of the rich history of allopolyploidy in the BCD clade and its evolutionary 110 significance.  Figure S1) . Of these 20 scaffolds, ten 165 pairs (with more than 50% matching across both scaffolds) were identified as the pairs of 166 homeologous scaffolds (Supplementary Figure S1) Figure 2B). Further, these pairs of B. decipiens scaffolds show large syntenic 174 blocks of duplicated genes ( Figure 2A). There was also evidence of rearrangements between 175 the sub genomes: for example, a translocation from scaffold 18 (homeologous to scaffold 10) 176 to scaffold 8 which appears to have regions from both sub genomes as a result ( Figure 2).

177
Therefore, this translocation likely occurred after the allopolyploidization event.  Instances of assignment to the alternate sub genome by the HMM occurred in three regions 215 (scaffold 9, 12, and 16), but these were regions of low kmer density, and therefore 216 challenging to assign to sub genomes using this kmer based approach. Also, these instances 217 did not reflect reciprocal exchanges between the sub genomes. Impacts of these ambiguities 218 in sub genome assignment were examined by including and excluding these regions in 219 downstream analyses involving sub genome identification (i.e., biased fractionation). We  The timeline of paleo tetraploidy 238 We and Setaria italica as outgroups to identify the likely timing of the sub genome divergence in Biases in gene retention between sub genomes 265 We analysed the collinear blocks between S. bicolor and B. decipiens to assess differences in  (Table 3). The two sub-genomes show asymmetric gene loss where sub genome A retained more genes 314 than sub genome B ( 4B) and the dominance of sub genome A (Table 3)  Alongside genes reverting to single copies, many genes were retained as duplicates. Genes  The evolutionary history of the Andropogenae     pairs, and the model was used to identify and break putative misjoins, score prospective joins,  using an inflation parameter (-I) of 3. We counted the occurrence and the total base pairs that 654 each LTR subfamily obtained from above clustering in the putative A and B sub genomes 655 identified above. We identified LTR subfamilies that were three times more common in one 656 of the sub genomes using both occurrence and bp count. Then we determined if these repeats 657 overlapped with multiple A or B genome preferred kmers to confirm that kmers were 658 representing longer repetitive sequences and to confirm that these kmers were marking repeat 659 expansion that occurred just before the allopolyploidy event.

660
The predicted protein sequences obtained from the final run of MAKER were aligned to the   Chromosome-scale shotgun assembly using an in vitro method for long-range linkage.